Multi-Armed Bandits for Minesweeper: Profiting from Exploration-Exploitation Synergy
نویسندگان
چکیده
منابع مشابه
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
Algorithms based on upper confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. This paper considers a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms. In earlier experimental works, such algorithms were found...
متن کاملDecoupling Exploration and Exploitation in Multi-Armed Bandits
We consider a multi-armed bandit problem where the decision maker can explore and exploit different arms at every round. The exploited arm adds to the decision maker’s cumulative reward (without necessarily observing the reward) while the explored arm reveals its value. We devise algorithms for this setup and show that the dependence on the number of arms, k, can be much better than the standar...
متن کاملExploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits
Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MaRaB) algorithm. With the goal of limiting the exploration of risky arms, MaRaB takes as arm quality its conditional value at risk. When the usersupplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MaRaB tends toward the MIN mu...
متن کاملExploration vs Exploitation vs Safety: Risk-averse Multi-Armed Bandits
Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MaRaB) algorithm. With the goal of limiting the exploration of risky arms, MaRaB takes as arm quality its conditional value at risk. When the usersupplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MaRaB tends toward the MIN mu...
متن کاملDistributed Exploration in Multi-Armed Bandits
We study exploration in Multi-Armed Bandits in a setting where k players collaborate in order to identify an ε-optimal arm. Our motivation comes from recent employment of bandit algorithms in computationally intensive, large-scale applications. Our results demonstrate a non-trivial tradeoff between the number of arm pulls required by each of the players, and the amount of communication between ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Games
سال: 2021
ISSN: 2475-1502,2475-1510
DOI: 10.1109/tg.2021.3082909